Conway Venn Diagram (drewconway.com).
Source: Harvard Business Review.
Vitrivian triangle (venustas, firmitas, utilitas)
library(readxl)
reservoirs <- read_excel("reservoirs.xlsx", sheet = 1)
reservoirs %>%
select(Date, River_Flow, Natural_Flow) %>%
muate(ERV = ifelse(Natural_Flow <= 8, 4, Natural_Flow / 2),
Date = as.Date(Date, format = "%d/%m/%Y")) %>%
pivot_longer(-Date,
values_to = "Flow",
names_to = "Source") %>%
ggplot(aes(Date, Flow, col = Source)) +
geom_line()r4h2o.Rprojhttps://github.com/pprevos/r4h2o/+ - * ^ %% %/%)<- versus =)sum(), prod(),
abs(), log(a, base = b)
= sign## [1] -13
## [1] 0.01767146
## [1] 150
diameter <- 50:350
pipe_area <- (pi / 4) * (diameter / 1000)^2
plot(diameter, pipe_area, type = "l", col = "blue", main = "Pipe Section Area")
abline(v = 150, col = "grey", lty = 2)
abline(h = (pi / 4) * (150 / 1000)^2, col = "grey", lty = 2)
points(150, (pi / 4) * (150 / 1000)^2, col = "red")First computer bug (1947)
_ or . or
camelCase\[Q = \frac{2}{3} C_d \sqrt{2g} \; lh^\frac{3}{2}\]
Create an R script and answer:
\(Q = \frac{2}{3} C_d \sqrt{2g} \; lh^\frac{3}{2}\)
library(dplyr)dplyr::filter()readr package for CSV files (part of Tidyverse)
read_csv() faster alternative for
read.csv()"abcd")"2022-03-01")"Male", "Female", "Other")TRUE, FALSE)Scalar, vector and data frame / tibble (matrix)
df[rows, columns]df$columnglimpse(df)gormsey[12:13, ]
gormsey[, 4:5]
gormsey[1:2, c(2, 4:5, 6)]
gormsey[1:2, c(-1, -3, -7)]
gormsey$Date[1:6]What is the sample number of the last sample in the Gormsey data?
Hint, use the nrow() function.
Compare variables
## [1] 2422
## [1] "Chlorine Total" "E. coli" "Turbidity" "THM"
##
## Chlorine Total E. coli THM Turbidity
## 760 760 168 734
## # A tibble: 4 × 2
## Measure n
## <chr> <int>
## 1 Chlorine Total 760
## 2 E. coli 760
## 3 THM 168
## 4 Turbidity 734
Write a script to answer these questions:
turbidity <- filter(gormsey, Measure == "Turbidity")
mean(turbidity$Result)
sum(turbidity$Result) / length(turbidity$Result)
median(turbidity$Result)Calculating the mode of continuous data is complex and requires a specialised package.
variable name`na.rm = TRUE
option in the sum() functiontop_n() function to list the top five
yearslibrary(tidyverse)
bom <- read_csv("casestudy1/IDCJAC0009_088110_1800_Data.csv")
bom_year <- group_by(filter(bom, Year != 2022), Year)
annual_rain <- summarise(bom_year,
TotalRain = sum(`Rainfall amount (millimetres)`,
na.rm = TRUE))
top_n(annual_rain, 5)## # A tibble: 5 × 2
## Year TotalRain
## <dbl> <dbl>
## 1 1973 970
## 2 1974 796
## 3 1993 802.
## 4 2010 962.
## 5 2011 830.
Jackson Pollock (1952) Blue Poles number 11. Drip Painting in enamel and aluminium paint with glass on canvas (National Gallery, Canberra).
Piet Mondrian (1928)
Composition with red, yellow and blue. Oil on canvas (Municipal Museum,
the Hague).
plot()barplot()histogram()boxplot()pch: the plotting symbol (default is open circle)lty: the line type (default is solid line), can be
dashed, dotted, etc.lwd: the line width, specified as an integer
multiplecol: the plotting color, specified as a number, string,
or hex codexlab: character string for the x-axis labelylab: character string for the y-axis labellibrary(tidyverse)
gormsey <- read_csv("casestudy1/gormsey.csv")
turbidity <- filter(gormsey, Measure == "Turbidity")
plot(turbidity$Date, turbidity$Result,
type = "l",
xlab = "Date",
ylab = "Result",
main = "Turbidity measurements")
abline(h = 5, col = "red")
boxplot(log10(Result) ~ Town, data = turbidity,
pch = 19, las = 3, col = "brown",
main = "Turbidity measurements")
abline(h = log10(5), col = "red")
p95 <- summarise(group_by(turbidity, Town),
p95 = quantile(Result, 0.95))
barplot(p95$p95, names.arg = p95$Town)
abline(h = 5, col = "red")thm <- filter(gormsey, Measure == "THM")
thm_grouped <- group_by(thm, Date)
thm_max <- summarise(thm_grouped, thm_max = max(Result))
ggplot(thm_max, aes(Date, thm_max)) +
geom_smooth(method = "lm") +
geom_line() +
geom_hline(yintercept = 0.25, col = "red")Practice Task: Convert this visualisation to a faceted graph to show the trend per system.
ggplot(turbidity, aes(Town, Result)) +
geom_boxplot() +
scale_y_log10(name = "Samples (log)",
n.breaks = 10) +
coord_flip()scale_x_log10(): Logarithmic scale.scale_x_discrete(): Discrete variables (names).scale_x_continuous(): Continuous variables, such as
measurements.scale_x_date(): For displaying dates and times.ggplot(turbidity, aes(Date, Result)) +
geom_area(col = "dodgerblue", fill = "dodgerblue") +
facet_wrap(~Town, ncol = 1) +
theme_void(base_size = 24)
ggplot(gormsey, aes(Measure)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 90))
ggsave("casestudy1/measures.png", width = 15, height = 10, dpi = 300, units = "cm")chlorine <- filter(gormsey, Measure == "Chlorine Total")
chlorine_gr <- group_by(chlorine, Town)
chlorine_avg <- summarise(chlorine_gr, avg = mean(Result))
ggplot(chlorine, aes(Date, Result)) +
geom_line() +
facet_wrap(~Town) +
geom_hline(data = chlorine_avg, aes(yintercept = avg), col = "blue", lty = 2) +
theme_minimal() +
labs(title = "Average total chlorine leves in Gormsey",
x = NULL, y = "Total Chlorine")https://raw.githubusercontent.com/rstudio/cheatsheets/master/data-visualization-2.1.pdf
http://www.bom.gov.au/climate/data/stations/variable name `na.rm = TRUE
option in the sum() functiontop_n() function to list the top ten yearslibrary(tidyverse)
bom <- read_csv("casestudy1/IDCJAC0009_088110_1800_Data.csv")
bom_year <- group_by(filter(bom, Year != 2022), Year)
annual_rain <- summarise(bom_year,
TotalRain = sum(`Rainfall amount (millimetres)`,
na.rm = TRUE))
top10 <- top_n(annual_rain, 5)
ggplot(annual_rain, aes(Year, TotalRain)) +
geom_col()The regulator for water quality has released a new guideline that lowers the maximum value for trihalomethanes (THMs) at the customer tap to 0.20 mg/l. This report assesses the historical performance of the Gormsey water system to evaluate the risk of non-compliance, assuming no operational changes are implemented.
library(readr)
library(dplyr)
gormsey <- read_csv("casestudy1/gormsey.csv")
thm <- filter(gormsey, Measure == "THM")
glimpse(thm)## Rows: 168
## Columns: 7
## $ Sample_No <dbl> 608188, 618273, 620904, 629974, 623529, 638727, 659889, 6…
## $ Date <date> 2069-01-11, 2069-01-16, 2069-01-16, 2069-01-23, 2069-01-…
## $ Sample_Point <chr> "BL_15694", "SO_12411", "SW_17608", "TA_16763", "ME_15385…
## $ Town <chr> "Blancathey", "Southwold", "Swadlincote", "Tarnstead", "M…
## $ Measure <chr> "THM", "THM", "THM", "THM", "THM", "THM", "THM", "THM", "…
## $ Result <dbl> 0.00300, 0.00300, 0.02100, 0.00300, 0.02600, 0.00300, 0.0…
## $ Units <chr> "mg/L", "mg/L", "mg/L", "mg/L", "mg/L", "mg/L", "mg/L", "…
ggplot(thm, aes(Town, Result)) +
geom_boxplot() +
geom_hline(yintercept = .2, col = "red", linetype = "longdash") +
scale_y_log10() +
coord_flip() +
theme_minimal() +
labs(title = "THM Results",
subtitle = "Gormsey") “All models of reality are wrong, but some are useful.”
## # A tibble: 5 × 7
## Sample_No Date Sample_Point Town Measure Result Units
## <dbl> <date> <chr> <chr> <chr> <dbl> <chr>
## 1 646594 2069-02-06 ME_19428 Merton THM 0.763 mg/L
## 2 697871 2070-03-26 SO_12771 Southwold THM 0.475 mg/L
## 3 626268 2070-06-12 ME_16234 Merton THM 0.355 mg/L
## 4 674225 2070-08-06 ME_19428 Merton THM 0.268 mg/L
## 5 672190 2070-12-10 ME_16234 Merton THM 0.714 mg/L
Further investigation required to determine the cause
library(knitr)
thm_fail <- filter(gormsey, Measure == "THM" & Result > .25)
kable(select(thm_fail, Date, Town, Result),
caption = "Example outut of the `kable()` function.",
digits = 2)| Date | Town | Result |
|---|---|---|
| 2069-02-06 | Merton | 0.76 |
| 2070-03-26 | Southwold | 0.48 |
| 2070-06-12 | Merton | 0.36 |
| 2070-08-06 | Merton | 0.27 |
| 2070-12-10 | Merton | 0.71 |
## [1] 28
## [1] 27.87
## [1] 30
## [1] 27
## [1] 28
## [1] 27.875